Seeding Statistical Machine Translation with Translation Memory Output through Tree-Based Structural Alignment
نویسندگان
چکیده
With the steadily increasing demand for high-quality translation, the localisation industry is constantly searching for technologies that would increase translator throughput, with the current focus on the use of high-quality Statistical Machine Translation (SMT) as a supplement to the established Translation Memory (TM) technology. In this paper we present a novel modular approach that utilises state-of-the-art sub-tree alignment to pick out pre-translated segments from a TM match and seed with them an SMT system to produce a final translation. We show that the presented system can outperform pure SMT when a good TM match is found. It can also be used in a Computer-Aided Translation (CAT) environment to present almost perfect translations to the human user with markup highlighting the segments of the translation that need to be checked manually for correctness.
منابع مشابه
Maximising TM Performance through Sub-Tree Alignment and SMT
With the steadily increasing demand for highquality translation, the localisation industry is constantly searching for technologies that would increase translator throughput, in particular focusing on the use of high-quality Statistical Machine Translation (SMT) supplementing the established Translation Memory (TM) technology. In this paper, we present a novel modular approach that utilises sta...
متن کاملTree-based Hybrid Machine Translation
I present an automatic post-editing approach that combines translation systems which produce syntactic trees as output. The nodes in the generation tree and targetside SCFG tree are aligned and form the basis for computing structural similarity. Structural similarity computation aligns subtrees and based on this alignment, subtrees are substituted to create more accurate translations. Two diffe...
متن کاملCombining Models for the Alignment of Parallel Syntactic Trees
The alignment of syntactic trees is the task of aligning the internal and leaf nodes of two sentences in different languages structured as trees. The output of the alignment can be used, for instance, as knowledge resource for learning translation rules (for rule-based machine translation systems) or models (for statistical machine translation systems). This paper presents some experiments carr...
متن کاملTuning Syntactically Enhanced Word Alignment for Statistical Machine Translation
We introduce a syntactically enhanced word alignment model that is more flexible than state-of-the-art generative word alignment models and can be tuned according to different end tasks. First of all, this model takes the advantages of both unsupervised and supervised word alignment approaches by obtaining anchor alignments from unsupervised generative models and seeding the anchor alignments i...
متن کاملN-Gram-Based Statistical Machine Translation versus Syntax Augmented Machine Translation: Comparison and System Combination
In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual ...
متن کامل